Non-Stationary Bandit Strategy for Rate Adaptation With Delayed Feedback
نویسندگان
چکیده
منابع مشابه
Multiarmed Bandit Problems with Delayed Feedback
In this paper we initiate the study of optimization of bandit type problems in scenarios where the feedback of a play is not immediately known. This arises naturally in allocation problems which have been studied extensively in the literature, albeit in the absence of delays in the feedback. We study this problem in the Bayesian setting. In presence of delays, no solution with provable guarante...
متن کاملStochastic Multi-Armed-Bandit Problem with Non-stationary Rewards
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play one of K arms, each characterized by an unknown reward distribution. Reward realizations are only observed when an arm is selected, and the gambler’s objective is to maximize his cumulative expected earnings over some given horizon of play T . To do this, the gambler needs to acquire information about arms (ex...
متن کاملOn Upper-Confidence Bound Policies for Non-Stationary Bandit Problems
A challenging variant of the MABP is the non-stationary bandit problem where the gambler must decide which arm to play while facing the possibility of a changing environment. In this paper, we consider the situation where the distributions of rewards remain constant over epochs and change at unknown time instants. We analyze two algorithms: the discounted UCB and the sliding-window UCB. We esta...
متن کاملDelayed feedback during sensorimotor learning selectively disrupts adaptation but not strategy use.
In sensorimotor adaptation tasks, feedback delays can cause significant reductions in the rate of learning. This constraint is puzzling given that many skilled behaviors have inherently long delays (e.g., hitting a golf ball). One difference in these task domains is that adaptation is primarily driven by error-based feedback, whereas skilled performance may also rely to a large extent on outcom...
متن کاملBroadcast Channels with Delayed Finite-Rate Feedback: Predict or Observe?
Most multiuser precoding techniques require accurate transmitter channel state information (CSIT) to maintain orthogonality between the users. Such techniques have proven quite fragile in time-varying channels because the CSIT is inherently imperfect due to estimation and feedback delay, as well quantization noise. An alternative approach recently proposed by Maddah-Ali and Tse (MAT) allows for...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2020
ISSN: 2169-3536
DOI: 10.1109/access.2020.2988671